Skip to content

perf: Optimize Utf8View string concat#21535

Merged
comphead merged 4 commits intoapache:mainfrom
neilconway:neilc/perf-string-concat-nulls
Apr 10, 2026
Merged

perf: Optimize Utf8View string concat#21535
comphead merged 4 commits intoapache:mainfrom
neilconway:neilc/perf-string-concat-nulls

Conversation

@neilconway
Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Rationale for this change

Optimize || for Utf8View values in two ways:

  1. We previously checked two Options for every row, to check for NULL values. It is faster to precompute the NULL bitmap and then just check a single bitmap. Annoyingly, StringViewBuilder still requires constructing the result NULL bitmap itself incrementally, but fixing that would be a more invasive change.
  2. We previously constructed the result value with write!({l}, {r}), which goes through the fmt machinery and is very slow. It is more efficient to just write_str twice.

Benchmarks (Arm64):

  - concat_utf8view/concat/nulls_0: 289.9µs → 140.2µs (-51.6%)
  - concat_utf8view/concat/nulls_10: 293.5µs → 154.7µs (-47.3%)
  - concat_utf8view/concat/nulls_50: 197.9µs → 95.0µs (-52.0%)

What changes are included in this PR?

  • Add benchmark for string concatenation
  • Implement optimizations described above

Are these changes tested?

Yes.

Are there any user-facing changes?

No.

@github-actions github-actions bot added the physical-expr Changes to the physical-expr crates label Apr 10, 2026
Copy link
Copy Markdown
Contributor

@metegenez metegenez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, this is similar NULL handling to your previous #21420, if i am not mistaken?

@neilconway
Copy link
Copy Markdown
Contributor Author

LGTM, this is similar NULL handling to your previous #21420, if i am not mistaken?

Thanks for the review! I have a bunch of PRs out that all follow a similar pattern, replacing per-row NULL handling with bulk NULL bitmap operations:

etc.

buffer.clear();
write!(&mut buffer, "{left}{right}")
.expect("writing into string buffer failed");
buffer.push_str(l);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

let nulls = NullBuffer::union(left.nulls(), right.nulls());

for i in 0..left.len() {
if nulls.as_ref().is_some_and(|n| n.is_null(i)) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we really need to check nulls.as_ref().is_some in loop?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could hoist this outside the loop, although

  1. This is a very common pattern in the code base
  2. I'd think the branch predictor should be able to handle this very effectively
  3. We'd need to duplicate the loop body to handle the two cases, no?

I'd say the current approach is okay but lmk if you disagree.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just checked with godbolt and and yeah, looks like compiler rewrote the thing correctly so it identifies is_some outside of the loop 😮

Copy link
Copy Markdown
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @neilconway

@comphead comphead added this pull request to the merge queue Apr 10, 2026
Merged via the queue into apache:main with commit 0626ca3 Apr 10, 2026
38 checks passed
@neilconway neilconway deleted the neilc/perf-string-concat-nulls branch April 10, 2026 20:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

physical-expr Changes to the physical-expr crates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Optimize NULL handling for string concatentation

3 participants